基于中心的聚类(例如,$ k $ -means,$ k $ -Medians)和使用线性子空间的聚类是两种最受欢迎的技术,可以将真实数据分配到较小的群集中。但是,当数据由敏感人群组组成时,不同敏感组的每点的聚集成本显着不同,可能会导致与公平相关的危害(例如,服务质量不同)。社会公平聚类的目的是最大程度地降低所有组中每点聚类的最大成本。在这项工作中,我们提出了一个统一的框架,以解决社会公平的基于中心的聚类和线性子空间聚类,并为这些问题提供实用,高效的近似算法。我们进行了广泛的实验,以表明在多个基准数据集上,我们的算法要么紧密匹配或超越最先进的基线。
translated by 谷歌翻译
我们解决了视频动作识别的数据增强问题。视频中的标准增强策略是手工设计的,并随机对可能的增强数据点的空间进行采样,而不知道哪个增强点会更好,或者是通过启发式方法会更好。我们建议学习是什么使良好的视频供行动识别,并仅选择高质量的样本进行增强。特别是,我们选择前景和背景视频的视频合成作为数据增强过程,从而导致各种新样本。我们了解了哪对视频要增加,而无需实际综合它们。这降低了可能的增强空间,这具有两个优势:它节省了计算成本并提高了最终训练的分类器的准确性,因为增强对的质量高于平均水平。我们在整个训练环境中介绍了实验结果:几乎没有射击,半监督和完全监督。我们观察到所有这些都对动力学,UCF101,HMDB51的基准进行了一致的改进,并在设置上实现了有限数据的新最新设置。在半监督环境中,我们看到高达8.6%的改善。
translated by 谷歌翻译
零射击动作识别是识别无视觉示例的识别性类别的任务,只有在没有看到看到的类别的seman-tic嵌入方式。问题可以看作是学习一个函数,该函数可以很好地讲述不见的阶级实例,而不会在类之间失去歧视。神经网络可以模拟视觉类别之间的复杂边界,从而将其作为监督模型的成功范围。但是,这些高度专业化的类边界可能不会从看不见的班级转移到看不见的类别。在本文中,我们提出了基于质心的表示,该表示将视觉和语义表示,同时考虑所有训练样本,通过这种方式,对看不见的课程的实例很好。我们使用强化学习对群集进行优化,这对我们的工作方法表明了至关重要的。我们称提出的甲壳类动物的命名为Claster,并观察到它在所有标准数据集中始终超过最先进的方法,包括UCF101,HMDB51和奥运会运动;在Thestandard Zero-shot评估和广义零射击学习中。此外,我们表明我们的模型在图像域也可以进行com的性能,在许多设置中表现出色。
translated by 谷歌翻译
While the brain connectivity network can inform the understanding and diagnosis of developmental dyslexia, its cause-effect relationships have not yet enough been examined. Employing electroencephalography signals and band-limited white noise stimulus at 4.8 Hz (prosodic-syllabic frequency), we measure the phase Granger causalities among channels to identify differences between dyslexic learners and controls, thereby proposing a method to calculate directional connectivity. As causal relationships run in both directions, we explore three scenarios, namely channels' activity as sources, as sinks, and in total. Our proposed method can be used for both classification and exploratory analysis. In all scenarios, we find confirmation of the established right-lateralized Theta sampling network anomaly, in line with the temporal sampling framework's assumption of oscillatory differences in the Theta and Gamma bands. Further, we show that this anomaly primarily occurs in the causal relationships of channels acting as sinks, where it is significantly more pronounced than when only total activity is observed. In the sink scenario, our classifier obtains 0.84 and 0.88 accuracy and 0.87 and 0.93 AUC for the Theta and Gamma bands, respectively.
translated by 谷歌翻译
Differentiable Architecture Search (DARTS) has attracted considerable attention as a gradient-based Neural Architecture Search (NAS) method. Since the introduction of DARTS, there has been little work done on adapting the action space based on state-of-art architecture design principles for CNNs. In this work, we aim to address this gap by incrementally augmenting the DARTS search space with micro-design changes inspired by ConvNeXt and studying the trade-off between accuracy, evaluation layer count, and computational cost. To this end, we introduce the Pseudo-Inverted Bottleneck conv block intending to reduce the computational footprint of the inverted bottleneck block proposed in ConvNeXt. Our proposed architecture is much less sensitive to evaluation layer count and outperforms a DARTS network with similar size significantly, at layer counts as small as 2. Furthermore, with less layers, not only does it achieve higher accuracy with lower GMACs and parameter count, GradCAM comparisons show that our network is able to better detect distinctive features of target objects compared to DARTS.
translated by 谷歌翻译
We propose an ensemble approach to predict the labels in linear programming word problems. The entity identification and the meaning representation are two types of tasks to be solved in the NL4Opt competition. We propose the ensembleCRF method to identify the named entities for the first task. We found that single models didn't improve for the given task in our analysis. A set of prediction models predict the entities. The generated results are combined to form a consensus result in the ensembleCRF method. We present an ensemble text generator to produce the representation sentences for the second task. We thought of dividing the problem into multiple small tasks due to the overflow in the output. A single model generates different representations based on the prompt. All the generated text is combined to form an ensemble and produce a mathematical meaning of a linear programming problem.
translated by 谷歌翻译
Diabetic Retinopathy (DR) is a leading cause of vision loss in the world, and early DR detection is necessary to prevent vision loss and support an appropriate treatment. In this work, we leverage interactive machine learning and introduce a joint learning framework, termed DRG-Net, to effectively learn both disease grading and multi-lesion segmentation. Our DRG-Net consists of two modules: (i) DRG-AI-System to classify DR Grading, localize lesion areas, and provide visual explanations; (ii) DRG-Expert-Interaction to receive feedback from user-expert and improve the DRG-AI-System. To deal with sparse data, we utilize transfer learning mechanisms to extract invariant feature representations by using Wasserstein distance and adversarial learning-based entropy minimization. Besides, we propose a novel attention strategy at both low- and high-level features to automatically select the most significant lesion information and provide explainable properties. In terms of human interaction, we further develop DRG-Net as a tool that enables expert users to correct the system's predictions, which may then be used to update the system as a whole. Moreover, thanks to the attention mechanism and loss functions constraint between lesion features and classification features, our approach can be robust given a certain level of noise in the feedback of users. We have benchmarked DRG-Net on the two largest DR datasets, i.e., IDRID and FGADR, and compared it to various state-of-the-art deep learning networks. In addition to outperforming other SOTA approaches, DRG-Net is effectively updated using user feedback, even in a weakly-supervised manner.
translated by 谷歌翻译
This paper deals with the problem of statistical and system heterogeneity in a cross-silo Federated Learning (FL) framework where there exist a limited number of Consumer Internet of Things (CIoT) devices in a smart building. We propose a novel Graph Signal Processing (GSP)-inspired aggregation rule based on graph filtering dubbed ``G-Fedfilt''. The proposed aggregator enables a structured flow of information based on the graph's topology. This behavior allows capturing the interconnection of CIoT devices and training domain-specific models. The embedded graph filter is equipped with a tunable parameter which enables a continuous trade-off between domain-agnostic and domain-specific FL. In the case of domain-agnostic, it forces G-Fedfilt to act similar to the conventional Federated Averaging (FedAvg) aggregation rule. The proposed G-Fedfilt also enables an intrinsic smooth clustering based on the graph connectivity without explicitly specified which further boosts the personalization of the models in the framework. In addition, the proposed scheme enjoys a communication-efficient time-scheduling to alleviate the system heterogeneity. This is accomplished by adaptively adjusting the amount of training data samples and sparsity of the models' gradients to reduce communication desynchronization and latency. Simulation results show that the proposed G-Fedfilt achieves up to $3.99\% $ better classification accuracy than the conventional FedAvg when concerning model personalization on the statistically heterogeneous local datasets, while it is capable of yielding up to $2.41\%$ higher accuracy than FedAvg in the case of testing the generalization of the models.
translated by 谷歌翻译
Previous work has shown the potential of deep learning to predict renal obstruction using kidney ultrasound images. However, these image-based classifiers have been trained with the goal of single-visit inference in mind. We compare methods from video action recognition (i.e. convolutional pooling, LSTM, TSM) to adapt single-visit convolutional models to handle multiple visit inference. We demonstrate that incorporating images from a patient's past hospital visits provides only a small benefit for the prediction of obstructive hydronephrosis. Therefore, inclusion of prior ultrasounds is beneficial, but prediction based on the latest ultrasound is sufficient for patient risk stratification.
translated by 谷歌翻译
We investigate data-driven texture modeling via analysis and synthesis with generative adversarial networks. For network training and testing, we have compiled a diverse set of spatially homogeneous textures, ranging from stochastic to regular. We adopt StyleGAN3 for synthesis and demonstrate that it produces diverse textures beyond those represented in the training data. For texture analysis, we propose GAN inversion using a novel latent domain reconstruction consistency criterion for synthesized textures, and iterative refinement with Gramian loss for real textures. We propose perceptual procedures for evaluating network capabilities, exploring the global and local behavior of latent space trajectories, and comparing with existing texture analysis-synthesis techniques.
translated by 谷歌翻译